Method and apparatus to enable efficient processing and transmission of network communications

ABSTRACT

A method and apparatus to enable efficient processing and transmission of network communications are described. A network transmission directed to one or more destination nodes within a network is received. One or more network transmission items are identified in the network transmission. One or more item signatures associated with the one or more network transmission items are generated. Finally, a determination is made whether the one or more network transmission items can be transmitted to the one or more destination nodes by further processing the one or more item signatures.

FIELD OF THE INVENTION

[0001] The present invention relates generally to computer/communicationnetwork management and, more particularly, to a method and apparatus toenable efficient processing and transmission of network communications.

BACKGROUND

[0002] Communication network gateway devices, such as network firewalls,govern the passage of information in and out of a given network orindividual network nodes, such as a personal computer.

[0003] Presently, various types of network firewalls exist to preventunauthorized communications from entering or leaving the network. Suchcommunications are exchanged, for example, between internal nodes withinthe network and external nodes outside of the network. These networkfirewalls filter information based on address information, for examplean Internet Protocol (IP) address and respective port, communicationprotocol, for example a User Datagram Protocol (UDP) or a TransmissionControl Protocol (TCP), or application protocol, wherein the state of anapplication is monitored to ensure that the application is requesting acommunication channel in accordance with expected behavior.

[0004] Other tools that govern network communications include contentfiltering and virus detection tools.

[0005] Content filtering tools are typically employed to permit orprevent access to code, control, data, mobile code, application state,service state, machine state (including virtual machines), or otherservices (herein referred to as “information” or “content”). Forexample, content filtering can be used to permit or deny access toinformation, such as proprietary materials (e.g. company secrets),licensed content (e.g. movies or applications), or obscene or otherobjectionable material. Such filtering tools govern access toinformation based on content headers (e.g. magic numbers denoting filetypes), content signatures (e.g. cryptographic hashes over some portionof, or possibly the entire, payload), payload type, network addresses,keyword searches and pattern matching, or rating information provided bythe author of the material or by a review board.

[0006] Virus detection tools are also used to filter networkcommunications, often at network gateway choke points, to prevent thereception, infection, or transmission of malicious information (e.g.code or data) such as worms, viruses, Trojan horses, etc, betweennetworks or network entities. In the context of communication networks,virus detection tools scan payloads searching attachments or files forvirus code. Typically these searches are for specific strings, or codesegments identified by a master database.

[0007] Also in existence are services such as program execution andauthentication control. For example, a computer wishing to launch anexecutable sequence must send a communication to a service on thenetwork in order to determine if the user has permissions to execute theprogram or, for example, to determine if the program has been illegallyor maliciously tampered with. The service determines if the program canexecute or not. These and other prior methods either rely on apredetermined/pre-identified list or database to identify content (orclasses of information), or only apply to specific information typesrequiring special payload structures, encodings, or packaging (e.g.digital rights management solutions).

BRIEF DESCRIPTION OF THE DRAWINGS

[0008] The present invention is illustrated by way of example and notlimitation in the figures of the accompanying drawings, in which likereferences indicate similar elements and in which:

[0009]FIG. 1 is a block diagram of a conventional networkinfrastructure.

[0010]FIG. 2 is a block diagram of one embodiment of a processingsystem.

[0011]FIG. 3 is a block diagram of one embodiment of an apparatus toenable efficient processing and transmission of network communications.

[0012]FIG. 4 is a flow diagram of one embodiment of a method to enableefficient processing and transmission of network communications.

[0013]FIG. 5 is a flow diagram of an alternate embodiment of the methodto enable efficient processing and transmission of networkcommunications.

DETAILED DESCRIPTION

[0014] According to embodiments described herein, a method and apparatusto enable efficient processing and transmission of networkcommunications are described.

[0015] In the following detailed description of embodiments of theinvention, reference is made to the accompanying drawings in which likereferences indicate similar elements, and in which are shown by way ofillustration specific embodiments in which the invention may bepracticed. These embodiments are described in sufficient detail toenable those skilled in the art to practice the invention, and it is tobe understood that other embodiments may be utilized and that logical,mechanical, electrical, functional, and other changes may be madewithout departing from the scope of the present invention. The followingdetailed description is, therefore, not to be taken in a limiting sense,and the scope of the present invention is defined only by the appendedclaims.

[0016] A network is a system of nodes connected by communication links.A node within the network may contain one or more logical devices.Devices may serve one or more roles such as being producers/providers orconsumers/users of computation, storage, information and services, aswell as providing or facilitating network processing and communicationgovernance. Such devices may include, but are not limited to, supercomputers, mainframes, servers, workstations, desktop computers,Personal Digital Assistants (PDAs) and other hand-held computing orstorage devices, gateway devices, cell phones, virtual machines, sensornodes, software-implemented entities (e.g. web services),network-enabled services and devices, and networking fabric such ashubs, switches, routers, firewalls, etc. A node is any entity thatreceives or transmits information on the network, without regard for thephysical medium (e.g. wired medium or wireless medium).

[0017] Communication networks, which include computer networks, oftenuse protocol layers to facilitate communication between nodes. Forexample, most computer networks, such as, for example, the Internet,operate using the standard Open Systems Interconnection (OSI) model. TheOSI has a layered architecture currently comprising physical (e.g.bits), data link (e.g. frames), network (e.g. packets), transport (e.g.connection-oriented paths), session (e.g. session messages),presentation (e.g. high-level messages), and application (e.g. messageand user data) layers.

[0018] Communications between two or more nodes (e.g. computers, networkdevices, sensors, etc) on a network, regardless of protocol stack level,may be facilitated by, or may pass through, multiple devices including,but not limited to, computers (e.g. proxy servers) and network devices(e.g. switches, routers, firewalls, etc), and are herein called “networkcommunications.”

[0019] During a network communication, the network transmits informationfrom a source node to a destination node. Alternatively, a source nodemay multicast or broadcast to a set of destination nodes. Entitiestransmitted during a network communication may include, but are notlimited to, all of or portions of articles and other content, such asfiles, documents, sensor readings, multimedia (e.g. music, video,images, voice), databases, compound articles (e.g. XML), archives,hierarchical articles, compressed articles, HTML web pages, encryptedarticles, user identities, passwords, transactions, electronic moneytransfers, facsimiles, and architectural designs.

[0020] Such network transmissions include one or more “networktransmission items.” Network transmission items may be selected from onenetwork protocol layer or from multiple layers. Moreover, networktransmission items may consist of multiple, but not necessarilycontiguous, elements of the transmission. Similarly, networktransmission items may also include portions of the data payloads, whichalso may further include compound or hierarchical articles.

[0021] For example, a network transmission item could encompass: thefull data payload, the first 100 bytes of the data payload, every 3rdbit of the data payload, the source and destination node IP addressesand ports, protocol headers, protocol fields, subsets or portions ofsuch elements (e.g. payloads, protocol fields, etc), IP packets, useridentity, user permissions, application state, or other protocol andexternal metadata. Alternatively, a network transmission item mightinclude one or more of the above listed elements or none at all. Networktransmission items include any subset, including the null set and theentirety, of a network transmission. A device, for example, may fail torecognize a network transmission item or may recognize that thetransmission should be processed in the same manner as an unrecognizedtransmission item.

[0022] Although network transmission items can include any element ofany protocol that constitutes the network transmission, within the scopeof the OSI model, the description will address elements at the networklayer (e.g. Internet Protocol (IP) packets), transport layer (e.g. TCP)and higher layers, such as the application layer.

[0023] A network communication may pass through or traverse one or moreintermediate network nodes when moving (i.e. being sent or routed) fromthe source node to the destination node. Alternatively, the networkcommunication may be routed from the source node directly to thedestination node. As a result, elements of the network communication,such as individual data packets, can take different routes whentransmitted from the source node to the destination node.

[0024] Networks are organized into domains, also called realms,generally due to technical (e.g. IP address assignments from privateaddress blocks or public address blocks and separation by shared networkdevices such as network gateway devices like routers) or policy realm(e.g. security, administrative) considerations. Network domains can varyin size from a single node to multiple nodes and are also hierarchical.The Internet, for instance, consists of many sub-domains.

[0025] Given a specific network node, an “internal network” identifiesthe collection of nodes that operate within the same network domain asthe specific network node. “Internal nodes” refer to nodes within thesame internal network. “External nodes” refer to any network nodeoutside of the internal network, i.e. a network node that is in adifferent network domain.

[0026] It is possible for a network node to operate within or acrossmultiple network domains or policy realms concurrently. The particulardesignation of internal network node versus external network node isapparent by the usage context to those skilled in the art.

[0027] Some networks and network domains can be created over existingnetwork topologies. Overlay networks, for example, may span, for certaintypes of functionality, multiple network domains. Examples of overlaynetworks include Groove Networks, Gnutella, and other known networks.Virtual private networks, for example, permit mobile or remote nodes toparticipate in an internal network domain by communicating in a securemanner, over external network infrastructure to the internal network. Inaddition, network nodes may move from one network to another over aperiod of time, such as, for example, in the case of mobile laptops,handheld devices, and cell phones. In any of these above cases, networknodes, however, are always subject to the policies of local physical(wired or wireless) network domain that they belong to.

[0028] Communications between internal nodes and external nodes mustusually traverse a network “gateway device.” The network gateway devicerelays potentially filtered or altered information from one networkrealm to another. The network gateway device may include, for example,software resident on a computer through which all network traffic to andfrom the computer passes. Similarly, a network gateway device mayconsist of one or several devices through which all computers on aprivate network communicate with the external world such as theInternet. Network gateway devices often comprise multiple networkingservices, monitoring and checking facilities, and administrativefunctions including, but not limited to: firewalls, network addresstranslation, security and integrity checks, network traffic logging,network bandwidth shaping, proxy services, and other known functions.

[0029] Network gateway devices often include content filteringmechanisms whose purpose is to prevent unauthorized entry or exit ofnetwork traffic and information. These checks may include, for example,payload virus scans and security checks such as, for example, searchingfor proprietary information surreptitiously embedded into photographicimages or movies using steganographic techniques. Such checks havenumerous outcomes including the complete or partial passing of thetransmission, or the denial of transmission.

[0030] Checks may also result in logging transmission information,alerts, human intervention, etc. It would be advantageous, therefore, toensure that checks are only performed on network communications thatrequire attention. Streamlining (optimizing) and, better yet, preventingunnecessary content checking improves overall system efficiency, costand security.

[0031] In one embodiment, a device receives a network transmission beingsent from a source node to a destination node. As the networktransmission traverses the device, the device identifies one or morenetwork transmission items within the network transmission. The devicecomputes one or more item identifiers associated with the networktransmission items. The device then accesses one or more databases usingthe item identifier to determine further actions.

[0032] In one embodiment, actions include, but are not limited to, oneor more the following:

[0033] Admit (pass) the network transmission or network transmissionitems.

[0034] Deny (drop) the network transmission or network transmissionitems.

[0035] Further processing or rewriting of the network transmission (ornetwork transmission items), which may include automated,computationally intensive inspection and analysis or tagging/alertsissued on items for human inspection and analysis.

[0036] Update stored metadata associated with a given item identifier.

[0037] Logging and statistics based on zero or more of the aboveactivities.

[0038] It should be noted that one more item identifiers may be used todetermine the outcome of the network transmission as a whole or someproper subsets of the network transmission (e.g. some smaller number ofnetwork transmission items).

[0039] In one embodiment, an item identifier is computed over thenetwork transmission data payload and that item identifier is used todetermine if the network transmission can be admitted (passed) or denied(dropped). For example, in this embodiment, actions may include:preventing company secret documents (item identifier known a priori)from being sent out of the corporate network; permitting anetwork-shippable home movie from uncle Ray to arrive (item identifiernot known a priori) and later to be forwarded by the user to cousinMary, but to pass through without the need for special attention oranalysis; preventing email from a malicious worm (item identifier knowna priori) from entering the corporate network.

[0040] In one embodiment, a restricted set of applications, protocols,or content/article/data types may be used. For example, a systemadministrator may wish to prevent users from receiving applicationbinaries for deprecated uses. Or, for example, it may be required toprevent the transmission of pictures of employees (e.g. GIFs stored foridentification badges) out of the internal network.

[0041]FIG. 1 is a block diagram of a conventional networkinfrastructure. Referring to FIG. 1, the block diagram illustrates anetwork environment in which the present invention operates. In thisconventional network infrastructure, a server computer processing system104 is coupled to a network 100, such as, for example, a wide-areanetwork (WAN). Wide-area network 100 includes the Internet, or otherproprietary networks, such as America Online™ or Microsoft Network™,each of which are well known to those of ordinary skill in the art.Wide-area network 100 may also include conventional network backbones,long-haul telephone lines, Internet service providers, various levels ofnetwork routers, and other conventional means for routing data betweennetwork devices.

[0042] Using conventional network protocols, server 104 may communicatethrough the wide-area network 100 to a plurality of client computerprocessing systems 102, connected to the wide-area network 100 invarious ways or directly connected to server 104. For example, as shownin the embodiment of FIG. 1, client 102 is connected directly to thewide-area network 100 through a digital broadband connection, or adirect or dial-up telephone connection or other network transmissionline.

[0043] In another alternate network topology, wide-area network, orexternal network, 100 may communicate with a local area network, orinternal network, 106 through a network gateway device 108. The gatewaydevice 108 is used to route data to clients 103 through the local areanetwork 106. Clients 103 may communicate with each other through thelocal area network 106, or with server 104 through the gateway device108 and the wide-area network 100. In another alternate embodiment (notshown), the server 104 and the clients 103 may be connected to eachother through the local area network 106, and not through the wide areanetwork 100. In addition, although the local area network 106 is definedby the network gateway device 108, two clients 103 within the local areanetwork 106 may be separated by another gateway device (not shown) suchas, for example, a router, or a Network Address Translation (NAT)device.

[0044] Using one of a variety of network connection devices, servercomputer 104 can communicate data directly with clients 102 or 103. In aparticular implementation of this network configuration, a servercomputer 104 may operate as a web server if the Internet is used aswide-area network 100. Using the Hyper Text Transfer Protocol (HTTP) andthe Hyper Text Markup Language (HTML) across a network, web server 104may communicate across the Web with clients 102 or 103. In thisconfiguration, a client 102 or 103 uses a client application programknown as a web browser, such as the Netscape Navigator™ browser,published by America Online™, the Internet Explorer™ browser, publishedby Microsoft Corporation of Redmond, Wash., the user interface ofAmerica Online™, or the web browser or HTML translator of any otherconventional supplier. Using such conventional browsers and the Web, theclient 102 or 103 may access graphical and textual data or video, audio,or tactile data provided by server 104. Conventional means exist bywhich the client 102 or 103 may supply information to web server 104through the network 100 and the web server 104 may return processed datato the client 102 or 103.

[0045] Having briefly described one embodiment of the networkenvironment in which the present invention operates, FIG. 2 shows oneembodiment of a computer processing system, which illustrates anexemplary client 102, 103, server 104, or gateway device 108 system inwhich the features of the present invention may be implemented.

[0046] In one embodiment of the invention, the processing system 200includes a system bus 201, or other communications module similar to thesystem bus, for communicating information, and a processing module, suchas processor 202, coupled to bus 201 for processing information.Processing system 200 further includes a main memory 204, such as arandom access memory (RAM) or other dynamic storage device, coupled tobus 201, for storing information and instructions to be executed byprocessor 202. Main memory 204 may also be used for storing temporaryvariables or other intermediate information during execution ofinstructions by processor 202.

[0047] Processing system 200 also comprises a read only memory (ROM)206, and/or other similar static storage device, coupled to bus 201, forstoring static information and instructions for processor 202.

[0048] An optional data storage device 207, such as a magnetic disk oroptical disk, and its corresponding drive, may also be coupled to theprocessing system 200 for storing information and instructions. Systembus 201 is coupled to an external bus 210, which connects the processingsystem 200 to other devices. In one embodiment, processing system 200can be coupled via bus 210 to a display device 221, such as a cathoderay tube (CRT) or a liquid crystal display (LCD), for displayinginformation to a computer user. For example, graphical or textualinformation may be presented to the user on display device 221.Typically, an alphanumeric input device 222, such as a keyboardincluding alphanumeric and other keys, is coupled to bus 210 forcommunicating information and/or command selections to processor 202.Another type of user input device is cursor control device 223, such asa conventional mouse, touch mouse, trackball, or other type of cursordirection keys, for communicating direction information and commandselection to processor 202 and for controlling cursor movement ondisplay 221. In one embodiment, processing system 200 may optionallyinclude video, camera, speakers, microphones, sound card, and many othersimilar conventional options and transducers.

[0049] A communication device 224 is also coupled to bus 210 foraccessing remote computers or servers, such as server 104, or otherservers via the Internet, for example. The communication device 224 mayinclude a modem, a network interface card, or other well-known interfacedevices, such as those used for interfacing with Ethernet, Token-ring,or other types of networks. In any event, in this manner, the processingsystem 200 may be coupled to a number of servers 104 via a conventionalnetwork infrastructure such as the infrastructure illustrated in FIG. 1and described above.

[0050]FIG. 3 is a block diagram of one embodiment of an apparatus toenable the efficient processing and transmission of networkcommunications, such as one embodiment of the network gateway device 108of FIG. 1. As illustrated in FIG. 3, in one embodiment, network gatewaydevice 108 receives information, such as, for example, packetscontaining data, from an external source node within an externalnetwork, such as, for example, a client 102 or server 104 within widearea network (WAN) 100. The information is transmitted during a networkcommunication between the external source node 102 or 104 and aninternal destination node, such as, for example a client 103 within thelocal area network (LAN) 106.

[0051] The gateway device 108 receives the network transmission along aninbound path 301 and routes the transmission to the internal destinationnode such as client 103 within LAN 106. Alternatively, the gatewaydevice 108 may receive a network transmission from an internal sourcenode within LAN 106 along an outbound path 302 and may transmit theinformation to an external destination node within WAN 100. It is to beunderstood that the inbound path, outbound path, internal and externalnetworks are logical designations. In one embodiment, the inbound andoutbound paths share the same physical medium (e.g. a CAT-5 cable). Inalternate embodiments, the internal and external networks may be thesame physical network distinguished by policy access or usage policyenforcement devices (e.g. a software firewall on a node). In anotheralternate embodiment, the external network may be the public Internet,while the internal network may be encrypted traffic within a VirtualPrivate Network (VPN) using the Internet as a communication layer.

[0052] In one embodiment, the gateway device 108 includes an inboundcommunication parser 311 coupled to the inbound path 301 to receive andto process the data payloads from the external source node 102 or 104and further includes an outbound communication parser 312 coupled to theoutbound path 302 to receive and to process the data payloads from theinternal source node 103. Alternatively, the gateway device 108 mayinclude only one communication parser, which replaces parsers 311 and312, to receive and to process the data payloads from both the inboundpath 301 and the outbound path 302. The communication parsers 311 and312 process the received data payloads and identify one or more networktransmission items within one or more data payloads. In one embodiment,the identified network transmission items are data fields of specifiedtypes, for example File Transfer Protocol (FTP) payloads, electronicmail attachments, network file services, or eXtensible Markup Language(XML) documents. In one embodiment, each network transmission item isidentified using pattern matching or using one or more of a number ofknown network communication (e.g. protocol stack) parsing mechanisms.

[0053] A “data payload” represents a collection of data identified bythe transport protocol (e.g. an XML file). A network transmission dataitem describes an element of data that may be transmitted by theapplication either within an application layer protocol or within adefined file format utilized by the application. In some instances thedata payload may be identical to the network transmission item (e.g. anHTML file), while in other instances the data payload may containmultiple items (e.g. MIME attachments in an e-mail message). Both datapayloads and their constituent items are typically independent of theunderlying network technologies and protocols.

[0054] For some network technologies, a similar relationship may existbetween the unit of transfer at the network layer and the transportlayer. For example, in a packet switched network, a single networkpacket may contain one or more data payloads, or the contents ofmultiple network data packets may be combined to form a single datapayload.

[0055] In one embodiment, the gateway device 108 further includes anidentifier generator coupled to each communication parser 311 and 312,such as, for example, signature generator 320. The signature generator320 receives the network transmission items identified by thecommunication parsers 311, 312 and generates one or more itemidentifiers corresponding to each network transmission item. The itemidentifier for the corresponding network transmission item includes avalue associated with the network transmission item. In one embodiment,the item identifier includes an item signature of the networktransmission item's contents.

[0056] In one embodiment, the signature generator uses a cryptographicsecure hash algorithm, for example, the SHA1 algorithm, or a messagedigest algorithm, for example the MD5 algorithm, to generate the itemidentifier. It is to be understood, however, that other cryptographicalgorithms, content-derived or attribute-derived signaturing algorithms(e.g., cyclic redundancy check (CRC) checksums), or compressionmechanisms may be used to generate the item identifier. Collectively,herein, these identifier calculation methods are referred to as“content-signaturing” or simply “signaturing” mechanisms and the outputof the methods are referred to as a “content signature” or “signature.”In addition, it is to be understood that any portions of or any numberof the above mentioned generation mechanisms may be combined together,or used separately to generate the respective item identifiers.

[0057] In one embodiment, a number of signatures may be combined,through a variety of mechanisms (e.g. concatenation, Boolean bit-wiseexclusive-OR'ing, etc.), to form an item identifier. By using one ormore content-signaturing mechanisms to generate item identifier values,the item identifier uniquely identifies, for all practical purposes, thecontents of a particular network transmission item, not just within thatnetwork transmission or network communication, but also across thesample space of all network communications.

[0058] In one embodiment, item identifiers are used to establish apartition over the set of network transmission items examined by thegateway device 108 using an identifier-based equivalency relation. Forexample, two network transmission items are considered to be“equivalent” and belong to the same block of the partition if they havethe same item identifier. By definition, partition blocks containmutually exclusive elements, each partition block is non-empty, and theunion of all blocks in a partition is a set of network transmissionitems examined by the gateway device 108. However, if two networktransmission items have identical item identifiers it does notnecessarily mean that the network communications or the networktransmissions (or other associated metadata) to which they belong or areassociated with are the same. In one embodiment, the identifier-basedequivalency relation produces the same partitioning of the set ofnetwork transmission items examined by the gateway device 108 as doesthe equivalency relation produced by pair-wise comparing respectivenetwork transmission item contents (e.g. bit-wise comparisons). Thesetwo partitions can differ if identifier aliasing occurs. Aliasing ofitem identifiers occurs when two network transmission items withdiffering contents have an identical item identifier assigned to them.Although technically possible, aliasing is highly unlikely with thechoice of a robust content-signaturing mechanism.

[0059] To decrease the probability of identifier aliasing, in oneembodiment, the item identifier includes a combination of one or morecryptographic hashes, augmented by a content signature generated from asubset of network transmission item contents or metadata associated withthe network transmission item. For example, a content signaturegenerated by an MD5 secure hash algorithm may be augmented with networktransmission item size information (appropriately formatted by acontent-signaturing mechanism) to produce a more robust item identifier,such as an item identifier derived only from the contents (e.g.,cryptographic hash of the network transmission item contents) orprotocol state or metadata concerning the contents (e.g., contentlength). As other items of protocol state or other metadata areintroduced, system correctness is maintained, but efficiency is reduced,as multiple network transmission items with the same contents butdifferent metadata may no longer be linked.

[0060] In order to maximize the amount of traffic that can be allowedwithout further processing, in one embodiment, network transmissionitems of certain types may undergo one or more normalizingtransformations into a standard, possibly canonical, format prior tocalculation of content item identifiers. In this embodiment, contentidentifiers may be generated based on the normalized representation'scontents and metadata. For example, a network transmission itemcontaining a file in a compressed format (e.g. a file compressed usingone of many known compression algorithms) may be transformed (i.e.decompressed) so that the content identifier is generated from theexpanded contents and expanded size.

[0061] In one embodiment, network transmission items of certain typesmay be transformed to expose a collection of constituent networktransmission items contained within a single network transmission item.For example, a network transmission item containing a file in acompressed format may be replaced with metadata and network transmissionitem contents for the network transmission items that comprise thecompressed file. Examples of such compound and/or hierarchical networktransmission items or network transmission item collections include, butare not limited to, archives (shell, tar, library, etc),multi-resolution representations of multimedia (e.g. differentcompression rates for music, video, or still pictures), etc. In oneembodiment, the device could use this kind of decomposition toselectively filter and process hierarchical or compound networktransmission items.

[0062] A content-signaturing mechanism identifier may be associated witha particular content-signaturing mechanism. This versioning informationcan prevent cross-algorithm aliasing and permit a system to be migratedto an improved item identifier generation mechanism, if desired (e.g.,over time, the system may change the mechanism in use). This identifiermay be used to facilitate systems where multiple content-signaturingalgorithms are in use simultaneously. This identifier may be used asmetadata in computing content signatures, implicitly including theidentifier in every item identifier generated. Alternatively, asdiscussed below, the identifier may be stored explicitly and not usedexplicitly in the computation of item identifiers.

[0063] In one embodiment a single, possibly reserved, identifier is usedto represent a null network transmission item. Null network transmissionitems are more likely to be caused by a system failure or any othershortcoming within the network. In such cases, it is more likely thatthe entire network transmission will be flagged for further logging,processing, and possibly adjustment of the network transmission itemrecognition/parsing algorithms.

[0064] Referring back to FIG. 3, the network gateway device 108 furtherincludes one or more databases coupled to the signature generator 320,such as the signature database 330. The signature database 330 storeseach item signature generated by the signature generator 320 and othermetadata information. The metadata may be information garnered from thenetwork transmissions, a history of network transmissions, informationpointing to related network transmissions (or network transmissionitems), logging and statistics information, administrative and securitypolicies, etc. Examples of metadata include source and destination hostIP addresses, network transmission and network transmission itemhandles/identifiers (e.g. to find items stored elsewhere), transmissiondirection and admit (pass) or deny (drop) policy (e.g. deny forwardingof network transmission if a given signature is seen), counts of thenumber of times a signature is accessed, last update timestamp, accesspermissions lists, etc.

[0065] In one embodiment, the database consists of data stored in thememory units of the gateway device 108. In alternate embodiments, thedatabase may be split between the gateway device 108 and an externalunit, or stored entirely in an external database (not shown) andaccessed through a connection to a separate device (e.g. an external SQLServer).

[0066] The gateway device 108 further includes a filter module 340directly coupled to the signature generator 320 and the signaturedatabase 330 and indirectly coupled to each communication parser 311 and312. In one embodiment, the filter module 340 determines whether eachnetwork transmission item was generated in the external network 100 andcan be transmitted to the external destination node, as described indetail below and in connection with FIGS. 4 and 5.

[0067] In one embodiment, if a network transmission carries a datapayload from an external source node within the external network 100 toan internal destination node within the internal network 106, thecommunication parser 311 receives the communication stream transmittedby the external source node, for example data packets transmitted by aclient 102 or the server 104 within WAN 100. The communication parser311 processes the data payload, which includes, among others, protocolheaders and metadata information, and identifies multiple networktransmission items, such as data fields of a specified type, andtransmits each network transmission item to the signature generator 320.The signature generator 320 generates one or more item signatures foreach network transmission item and stores each item signature in thesignature database 330 along with other associated metadata information.

[0068] In one embodiment, each item signature is compared to eachsignature already stored within the signature database 330 to determineif a match already exists. If no match exists, then the item signatureis stored within the signature database 330.

[0069] In one embodiment, the item signatures within the signaturedatabase 330 create a record of the network transmission items generatedin the external network 100. Finally, the network transmission items areselectively transmitted to the internal destination node within theinternal network, for example to a client 103 within LAN 106. In oneembodiment, only the signatures of processed network transmission itemsare placed in the database, and metadata is discarded.

[0070] In an alternate embodiment, if a network communication isinitiated between an internal source node and an external destinationnode, the communication parser 312 receives the network transmissionfrom the internal source node within the internal network, for exampledata payloads transmitted by a client 103 within LAN 106. Thecommunication parser 312 processes each data payload and identifiesmultiple network transmission items, such as data fields of a specifiedtype, and transmits each network transmission item to the signaturegenerator 320. The signature generator 320 generates one or more itemsignature for each network transmission item and transmits each networktransmission item and its corresponding item signature to the filtermodule 340 for a determination of whether the particular item originatedfrom the external network WAN 100.

[0071] The filter module 340 accesses the signature database 330 andcompares each item signature corresponding to a network transmissionitem to the already stored signatures. If an item signature matches oneor more signatures within the signature database 330, i.e. thecorresponding network transmission item was generated (i.e. originated)in the external network, the filter module 340 selectively transmits thenetwork transmission item to the external destination node within theexternal network. If the item signature does not match at least one ofthe stored signatures, i.e. the corresponding network transmission itemmay be internally created data or intellectual property (e.g. code,document, multimedia, etc), the filter module 340 may block the networktransmission item immediately, or may transmit the network transmissionitem to a processing module 350 for further examination, analysis, andaction processing, such as, for example, subjecting the data to apattern-matching test for confidential information not to be releasedexternally. In one embodiment, the processing module 350 performscontent analysis to assess the risks of transmission to the externaldestination node. Alternatively, the processing module 350 may performany of a number of known security processing tasks to achieve the sameresult.

[0072] For example, by way of illustration, a user transfers an image ofa beloved pet from an external network node to an internal network nodeof a company. The network transmission passes from the external network100 to the internal network 106 through a network gateway device 108.The incoming network transmission containing the digital image passesthrough the network gateway device 108, as described in detail aboveaccording to one embodiment of the invention.

[0073] The image constitutes the data payload of the networktransmission. The gateway device 108 relays the network transmission,but also identifies and computes a signature using a cryptographicsecure hash algorithm (e.g. SHA1) over the data payload (the networktransmission item identified), that is, the digital image. The signatureis stored in signature database 330 associated with the gateway device108 along with information indicating that the data payload originatedon an external network node. This information can be determined byexamining the source and destination IP addresses, for example, or canbe provided by the gateway device 108 based on the port upon which thetransmission was received. Possibly, the user later decides to send someimage from the internal network node to some external network node, forexample a friend's computer. As the image leaves the internal network106 within a network transmission it must pass through the networkgateway device 108. The gateway device 108 buffers the outgoing networktransmission and computes a signature using the same cryptographic hashalgorithm over the same portions of the data payload (the image). Theresulting signature is used to access the database 330. If the database330 contains the same signature and it indicates that the imageoriginated in the external network 100, the policy implemented on thedevice is to simply forward the image to the specified destination andno further processing is required and the transaction is not loggedbecause it is innocuous. This would be the case, for instance, if theuser had resent the pet image that was known to have originated from theexternal network 100. If, however, the signature does not match anysignatures stored in the database 330, further security processingchecks are run. For example, computationally expensive steganographicdetection checks may be used to determine if some company secret hasbeen surreptitiously embedded in the image by a malicious individual.Further processing, including logging, alerts, human-intervention, etc,may result in the network transmission being passed on to the externalnetwork node, or denied, or detained for further examination. Or, forexample, the signature may be present in the database 330, butinformation may indicate that the payload is company private or licensedmaterial and, as a result, the network transmission is denied. Thesesubsequent procedures are typically dictated by associated policies. Thegateway device 108 is able to filter network transmissions to reduce theneed for checking all network traffic which incurs costs in terms ofnetwork latency, network bandwidth, time space, human labor, log filesizes and monitoring, etc. The gateway device 108 also enables resourcesto be freed up and for resources to focus on, for example, true securitythreats, but also serve as a hook for other logging and monitoringpurposes. Moreover, for some policies, such as permittingexternally-sourced information to be retransmitted by an internalnetwork node to an external network node, all database updates andtracking can be done automatically without human intervention or priorspecification.

[0074]FIG. 4 is a flow diagram of one embodiment of a method to enableefficient processing and transmission of network communications. Asillustrated in FIG. 4, in one embodiment, at processing block 410, thedata payloads are received through inbound traffic from an externalsource node within an external network.

[0075] At processing block 420, network transmission items within eachdata payload are identified. At processing block 430, one or more itemsignatures are generated for each identified network transmission item.

[0076] At processing block 440, each item signature of a correspondingnetwork transmission item is stored within one or more signaturedatabases. In one embodiment, a comparison is first performed withsignatures already stored within the one or more signature databases todetermine if a match with the item signature exists. Finally, atprocessing block 450, the data payloads containing the identifiednetwork transmission items are selectively transmitted to an internaldestination node within an internal network.

[0077] In one embodiment, the method illustrated in connection with FIG.4 may alternatively be used to handle outbound traffic from an internalnode to an external node.

[0078]FIG. 5 is a flow diagram of an alternate embodiment of the methodto enable efficient processing and transmission of networkcommunications. As illustrated in FIG. 5, at processing block 510, datapayloads are received through outbound traffic from an internal sourcenode within the internal network.

[0079] At processing block 520, network transmission items within eachdata payload are identified. At processing block 530, one or more itemsignatures are generated for each identified network transmission item.

[0080] At processing block 540, a signature database lookup is performedfor each item signature to find one or more matching signatures. Atprocessing block 550, a decision is made whether a matching signature isfound for the particular item signature.

[0081] If a matching signature is found, at processing block 560, thecorresponding network transmission item is transmitted to an externaldestination node within the external network and blocks 540 and 550 arerepeated.

[0082] Otherwise, if a matching signature is not found, at processingblock 570, the corresponding network transmission item is blocked and istransmitted to a processing module for further processing and blocks 540and 550 are repeated.

[0083] In one embodiment, the method illustrated in connection with FIG.5 may alternatively be used to handle inbound traffic from an externalnode to an internal node. For example, filtering of inbound networktransmissions, often based on policy, may be performed using thedescribed processing blocks of the method.

[0084] In one embodiment of the invention, network transmission itemsthat are not found in the approved signature database are sent toanother analysis mechanism for further analysis. Hardware, software, andhuman analysis may be employed, and items may be permitted based on ahuman decision, or the result of some additional processing step. In oneembodiment, the result of such additional processing is an itemsignature, which is provided to the signature database in order topermit future traffic of a similar network transmission item.

[0085] In one embodiment, an interface may be established for anadministrator, or an external service, to provide signature data, suchas item signatures and associated metadata information, for networktransmission items, which should be approved or prohibited. In oneembodiment, items added by such an administrator or external service arepermitted to traverse (transmission items are passed/admitted through)the gateway device without any further processing.

[0086] In one embodiment, the signature database also includes a set ofsignatures of items to reject. If a transmission item destined for aninternal or external node is received, and the item signaturecorresponding to the transmission item has a correspondent on the listof items to be rejected (transmission items are dropped), then thenetwork transmission item or the entire transmission may be prohibited.In one embodiment, the item signature and source and destinationaddresses are stored for future use.

[0087] In one embodiment, signatures and metadata such as the source anddestination addresses, users, etc. are logged. An interface may beprovided to present a network transmission item or corresponding itemsignature to the gateway device. The gateway device provides datarelated to any potential traffic involving the network transmissionitem, such as if and how it entered the internal network, and if and howit has been sent out of the internal network.

[0088] In one embodiment, filter conditions may be specified whichdetermine if the contents should be blocked based upon the metadataassociated with the signature. For instance an ‘allow’ rule mightindicate that any item which originated on ‘Publicserver’ should beallowed to pass. Alternatively, a ‘deny’ rule could be specified thatany Adobe® Portable Document Format (PDF) document originating from‘PrivateServer’ should not be transmitted.

[0089] It is to be understood that embodiments of this invention may beused as or to support software programs executed upon some form ofprocessing core (such as the CPU of a computer) or otherwise implementedor realized upon or within a machine or computer readable medium. Amachine readable medium includes any mechanism for storing ortransmitting information in a form readable by a machine (e.g., acomputer). For example, a machine readable medium includes read-onlymemory (ROM); random access memory (RAM); magnetic disk storage media;optical storage media; flash memory devices; electrical, optical,acoustical or other form of propagated signals (e.g., carrier waves,infrared signals, digital signals, etc.); or any other type of mediasuitable for storing or transmitting information. While embodiments ofthe present invention will be described with reference to the Internetand the World Wide Web, the system and method described herein isequally applicable to other network infrastructures or other datacommunication systems.

[0090] In the foregoing specification, the invention has been describedwith reference to specific exemplary embodiments thereof. It will,however, be evident that various modifications and changes may be madethereto without departing from the broader spirit and scope of theinvention as set forth in the appended claims. The specification anddrawings are, accordingly, to be regarded in an illustrative rather thana restrictive sense.

What is claimed is:
 1. A method comprising: receiving a networktransmission directed to at least one destination node within a network;identifying at least one network transmission item in said networktransmission; generating at least one item signature associated withsaid at least one network transmission item; and determining whethersaid at least one network transmission item can be transmitted to saidat least one destination node by further processing said at least oneitem signature.
 2. The method according to claim 1, wherein saiddetermining further comprises: comparing said at least one itemsignature to each signature of a plurality of signatures stored in atleast one signature database; storing said at least one item signaturein said at least one signature database if no match exists between saidat least one item signature and said each signature; and selectivelytransmitting said at least one network transmission item to said atleast one destination node, if said network is an internal network. 3.The method according to claim 2, further comprising: storing itemmetadata information associated with said at least one networktransmission item together with said at least one item signature in saidat least one signature database.
 4. The method according to claim 1,wherein said determining further comprises: comparing said at least oneitem signature to each signature of a plurality of signatures stored inat least one signature database; and selectively transmitting said atleast one network transmission item to said at least one destinationnode, if said network is an external network and said at least one itemsignature matches a signature within said at least one signaturedatabase.
 5. The method according to claim 4, wherein said transmittingfurther comprises: comparing item metadata information associated withsaid at least one network transmission item to metadata stored withinsaid at least one item database and corresponding to said signature; andtransmitting said at least one network transmission item to said atleast one destination node, if said metadata does not prohibittransmission of said at least one network transmission item.
 6. Themethod according to claim 5, wherein said metadata is inserted withinsaid at least one signature database in conjunction with said eachsignature of said plurality of signatures.
 7. The method according toclaim 4, wherein said transmitting further comprises: comparing itemmetadata information associated with said at least one networktransmission item to metadata stored within said at least one itemdatabase and corresponding to said signature; and blocking transmissionof said at least one network transmission item to said at least onedestination node, if said metadata prohibits said transmission.
 8. Themethod according to claim 1, wherein said determining further comprises:comparing said at least one item signature to each signature of aplurality of signatures stored in at least one signature database; andtransmitting said at least one network transmission item to a processingmodule for further processing, if said network is an external networkand said at least one item signature does not match a signature withinsaid at least one signature database.
 9. The method according to claim8, further comprising: blocking said at least one network transmissionitem from being transmitted to said at least one destination node. 10.The method according to claim 8, further comprising: selectivelytransmitting said at least one network transmission item to said atleast one destination node; and inserting said at least one itemsignature in said at least one signature database.
 11. The methodaccording to claim 8, further comprising: logging said at least onenetwork transmission item in a log database.
 12. The method according toclaim 1, wherein said at least one item signature further comprises acryptographic hash algorithm.
 13. An apparatus comprising: at least onecommunication parser to receive a network transmission directed to atleast one destination node within a network and to identify at least onenetwork transmission item in said network transmission; a signaturegenerator coupled to said at least one communication parser to generateat least one item signature associated with said at least one networktransmission item; and a filter module coupled to said signaturegenerator to determine whether said at least one network transmissionitem can be transmitted to said at least one destination node by furtherprocessing said at least one item signature.
 14. The apparatus accordingto claim 13, wherein said filter module further compares said at leastone item signature to each signature of a plurality of signatures storedin at least one signature database, stores said at least one itemsignature in said at least one signature database if no match existsbetween said at least one item signature and said each signature, andselectively transmits said at least one network transmission item tosaid at least one destination node, if said network is an internalnetwork.
 15. The apparatus according to claim 14, wherein said filtermodule further stores item metadata information associated with said atleast one network transmission item together with said at least one itemsignature in said at least one signature database.
 16. The apparatusaccording to claim 13, wherein said filter module further compares saidat least one item signature to each signature of a plurality ofsignatures stored in at least one signature database and selectivelytransmits said at least one network transmission item to said at leastone destination node, if said network is an external network and said atleast one item signature matches a signature within said at least onesignature database.
 17. The apparatus according to claim 16, whereinsaid filter module further compares item metadata information associatedwith said at least one network transmission item to metadata storedwithin said at least one item database and corresponding to saidsignature, and further transmits said at least one network transmissionitem to said at least one destination node, if said metadata does notprohibit transmission of said at least one network transmission item.18. The apparatus according to claim 17, wherein said metadata isinserted within said at least one signature database in conjunction withsaid each signature of said plurality of signatures.
 19. The apparatusaccording to claim 16, wherein said filter module further compares itemmetadata information associated with said at least one networktransmission item to metadata stored within said at least one itemdatabase and corresponding to said signature, and blocks transmission ofsaid at least one network transmission item to said at least onedestination node, if said metadata prohibits said transmission.
 20. Theapparatus according to claim 13, wherein said filter module furthercompares said at least one item signature to each signature of aplurality of signatures stored in at least one signature database andfurther transmits said at least one network transmission item to aprocessing module coupled to said filter module for further processing,if said network is an external network and said at least one itemsignature does not match a signature within said at least one signaturedatabase.
 21. The apparatus according to claim 20, wherein saidprocessing module further blocks said at least one network transmissionitem from being transmitted to said at least one destination node. 22.The apparatus according to claim 20, wherein said processing modulefurther transmits selectively said at least one network transmissionitem to said at least one destination node and inserts said at least oneitem signature in said at least one signature database.
 23. Theapparatus according to claim 20, wherein said processing module furtherlogs said at least one network transmission item in a log database. 24.The apparatus according to claim 13, wherein said at least one itemsignature further comprises a cryptographic hash algorithm.
 25. A systemcomprising: a memory; and a processor coupled to said memory to receivea network transmission directed to at least one destination node withina network, to identify at least one network transmission item in saidnetwork transmission, to generate at least one item signature associatedwith said at least one network transmission item, and to determinewhether said at least one network transmission item can be transmittedto said at least one destination node by further processing said atleast one item signature.
 26. The system according to claim 25, whereinsaid processor compares said at least one item signature to eachsignature of a plurality of signatures stored in at least one signaturedatabase within said memory, stores said at least one item signature insaid at least one signature database if no match exists between said atleast one item signature and said each signature, and selectivelytransmits said at least one network transmission item to said at leastone destination node, if said network is an internal network.
 27. Thesystem according to claim 26, wherein said processor further stores itemmetadata information associated with said at least one networktransmission item together with said at least one item signature in saidat least one signature database.
 28. The system according to claim 25,wherein said processor further compares said at least one item signatureto each signature of a plurality of signatures stored in at least onesignature database and selectively transmits said at least one networktransmission item to said at least one destination node, if said networkis an external network and said at least one item signature matches asignature within said at least one signature database.
 29. The systemaccording to claim 28, wherein said processor further compares itemmetadata information associated with said at least one networktransmission item to metadata stored within said at least one itemdatabase and corresponding to said signature, and further transmits saidat least one network transmission item to said at least one destinationnode, if said metadata does not prohibit transmission of said at leastone network transmission item.
 30. The system according to claim 29,wherein said metadata is inserted within said at least one signaturedatabase in conjunction with said each signature of said plurality ofsignatures.
 31. The system according to claim 28, wherein said processorfurther compares item metadata information associated with said at leastone network transmission item to metadata stored within said at leastone item database and corresponding to said signature, and blockstransmission of said at least one network transmission item to said atleast one destination node, if said metadata prohibits saidtransmission.
 32. An apparatus comprising: means for receiving a networktransmission directed to at least one destination node within a network;means for identifying at least one network transmission item in saidnetwork transmission; means for generating at least one item signatureassociated with said at least one network transmission item; and meansfor determining whether said at least one network transmission item canbe transmitted to said at least one destination node by furtherprocessing said at least one item signature.
 33. The apparatus accordingto claim 32, further comprising: means for comparing said at least oneitem signature to each signature of a plurality of signatures stored inat least one signature database; means for storing said at least oneitem signature in said at least one signature database if no matchexists between said at least one item signature and said each signature;and means for selectively transmitting said at least one networktransmission item to said at least one destination node, if said networkis an internal network.
 34. The apparatus according to claim 33, furthercomprising: means for storing item metadata information associated withsaid at least one network transmission item together with said at leastone item signature in said at least one signature database.
 35. Theapparatus according to claim 32, further comprising: means for comparingsaid at least one item signature to each signature of a plurality ofsignatures stored in at least one signature database; and means forselectively transmitting said at least one network transmission item tosaid at least one destination node, if said network is an externalnetwork and said at least one item signature matches a signature withinsaid at least one signature database.
 36. The apparatus according toclaim 35, further comprising: means for comparing item metadatainformation associated with said at least one network transmission itemto metadata stored within said at least one item database andcorresponding to said signature; and means for transmitting said atleast one network transmission item to said at least one destinationnode, if said metadata does not prohibit transmission of said at leastone network transmission item.
 37. The apparatus according to claim 36,wherein said metadata is inserted within said at least one signaturedatabase in conjunction with said each signature of said plurality ofsignatures.
 38. The apparatus according to claim 35, further comprising:means for comparing item metadata information associated with said atleast one network transmission item to metadata stored within said atleast one item database and corresponding to said signature; and meansfor blocking transmission of said at least one network transmission itemto said at least one destination node, if said metadata prohibits saidtransmission.
 39. The apparatus according to claim 32, furthercomprising: means for comparing said at least one item signature to eachsignature of a plurality of signatures stored in at least one signaturedatabase; and means for transmitting said at least one networktransmission item to a processing module for further processing, if saidnetwork is an external network and said at least one item signature doesnot match a signature within said at least one signature database. 40.The apparatus according to claim 39, further comprising: means forblocking said at least one network transmission item from beingtransmitted to said at least one destination node.
 41. The apparatusaccording to claim 39, further comprising: means for selectivelytransmitting said at least one network transmission item to said atleast one destination node; and means for inserting said at least oneitem signature in said at least one signature database.
 42. Theapparatus according to claim 39, further comprising: means for loggingsaid at least one network transmission item in a log database.
 43. Acomputer readable medium containing executable instructions, which, whenexecuted in a processing system, cause said processing system to performa method comprising: receiving a network transmission directed to atleast one destination node within a network; identifying at least onenetwork transmission item in said network transmission; generating atleast one item signature associated with said at least one networktransmission item; and determining whether said at least one networktransmission item can be transmitted to said at least one destinationnode by further processing said at least one item signature.
 44. Thecomputer readable medium according to claim 43, wherein said determiningfurther comprises: comparing said at least one item signature to eachsignature of a plurality of signatures stored in at least one signaturedatabase; storing said at least one item signature in said at least onesignature database if no match exists between said at least one itemsignature and said each signature; and selectively transmitting said atleast one network transmission item to said at least one destinationnode, if said network is an internal network.
 45. The computer readablemedium according to claim 44, wherein said method further comprises:storing item metadata information associated with said at least onenetwork transmission item together with said at least one item signaturein said at least one signature database.
 46. The computer readablemedium according to claim 43, wherein said determining furthercomprises: comparing said at least one item signature to each signatureof a plurality of signatures stored in at least one signature database;and selectively transmitting said at least one network transmission itemto said at least one destination node, if said network is an externalnetwork and said at least one item signature matches a signature withinsaid at least one signature database.
 47. The computer readable mediumaccording to claim 46, wherein said transmitting further comprises:comparing item metadata information associated with said at least onenetwork transmission item to metadata stored within said at least oneitem database and corresponding to said signature; and transmitting saidat least one network transmission item to said at least one destinationnode, if said metadata does not prohibit transmission of said at leastone network transmission item.
 48. The computer readable mediumaccording to claim 47, wherein said metadata is inserted within said atleast one signature database in conjunction with said each signature ofsaid plurality of signatures.
 49. The computer readable medium accordingto claim 46, wherein said transmitting further comprises: comparing itemmetadata information associated with said at least one networktransmission item to metadata stored within said at least one itemdatabase and corresponding to said signature; and blocking transmissionof said at least one network transmission item to said at least onedestination node, if said metadata prohibits said transmission.
 50. Thecomputer readable medium according to claim 43, wherein said determiningfurther comprises: comparing said at least one item signature to eachsignature of a plurality of signatures stored in at least one signaturedatabase; and transmitting said at least one network transmission itemto a processing module for further processing, if said network is anexternal network and said at least one item signature does not match asignature within said at least one signature database.
 51. The computerreadable medium according to claim 50, wherein said method furthercomprises: blocking said at least one network transmission item frombeing transmitted to said at least one destination node.
 52. Thecomputer readable medium according to claim 50, wherein said methodfurther comprises: selectively transmitting said at least one networktransmission item to said at least one destination node; and insertingsaid at least one item signature in said at least one signaturedatabase.
 53. The computer readable medium according to claim 50,wherein said method further comprises: logging said at least one networktransmission item in a log database.